A Simple Evaluation Model for Feature Subset Selection Algorithms
نویسندگان
چکیده
The aim of Feature Subset Selection – FSS – algorithms is to select a subset of features from the original set of features that describes a data set according to some importance criterion. To accomplish this task, FSS removes irrelevant and/or redundant features, as they may decrease data quality and reduce several of the desired properties of classifiers induced by supervised learning algorithms. As learning the best subset of features is an NP-hard problem, FSS algorithms generally use heuristics to select subsets. Therefore, it is important to empirically evaluate the performance of these algorithms. However, this evaluation needs to be multicriteria, i.e., it should take into account several properties. This work describes a simple model we have proposed to evaluate FSS algorithms which considers two properties, namely the predictive performance of the classifier induced using the subset of features selected by different FSS algorithms, as well as the reduction in the number of features. Another multicriteria performance evaluation model based on rankings, which makes it possible to consider any number of properties is also presented. The models are illustrated by their application to four well known FSS algorithms and two versions of a new FSS algorithm we have developed.
منابع مشابه
Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملتعیین ماشینهای بردار پشتیبان بهینه در طبقهبندی تصاویر فرا طیفی بر مبنای الگوریتم ژنتیک
Hyper spectral remote sensing imagery, due to its rich source of spectral information provides an efficient tool for ground classifications in complex geographical areas with similar classes. Referring to robustness of Support Vector Machines (SVMs) in high dimensional space, they are efficient tool for classification of hyper spectral imagery. However, there are two optimization issues which s...
متن کاملImprovement of effort estimation accuracy in software projects using a feature selection approach
In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملIFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF
Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...
متن کاملA Parallel Genetic Algorithm Based Method for Feature Subset Selection in Intrusion Detection Systems
Intrusion detection systems are designed to provide security in computer networks, so that if the attacker crosses other security devices, they can detect and prevent the attack process. One of the most essential challenges in designing these systems is the so called curse of dimensionality. Therefore, in order to obtain satisfactory performance in these systems we have to take advantage of app...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Inteligencia Artificial, Revista Iberoamericana de Inteligencia Artificial
دوره 10 شماره
صفحات -
تاریخ انتشار 2006